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Abstract 

Recent  results  in  Compressive  Sensing  have  shown  that,  under  certain  conditions, 
the  solution  to  an  underdetermined  system  of  linear  equations  with  sparsity-based 
regularization  can  be  accurately  recovered  by  solving  convex  relaxations  of  the 
original  problem.  In  this  work,  we  present  a  novel  primal-dual  analysis  on  a  class 
of  sparsity  minimization  problems.  We  show  that  the  Lagrangian  bidual  (i.e., 
the  Lagrangian  dual  of  the  Lagrangian  dual)  of  the  sparsity  minimization  prob¬ 
lems  can  be  used  to  derive  interesting  convex  relaxations:  the  bidual  of  the  £q- 
minimization  problem  is  the  t\  -minimization  problem;  and  the  bidual  of  the  £0,1- 
minimization  problem  for  enforcing  group  sparsity  on  structured  data  is  the  fij00- 
minimization  problem.  The  analysis  provides  a  means  to  compute  per-instance 
non-trivial  lower  bounds  on  the  (group)  sparsity  of  the  desired  solutions.  In  a  real- 
world  application,  the  bidual  relaxation  improves  the  performance  of  a  sparsity- 
based  classification  framework  applied  to  robust  face  recognition. 


1  Introduction 

The  last  decade  has  seen  a  renewed  interest  in  the  problem  of  solving  an  underdetermined  system  of 
equations  Ax  =  b,  A  £  Rmxn,  b  £  Rm,  where  m  «  n,  by  regularizing  its  solution  to  be  sparse, 
i.e.,  having  very  few  non-zero  entries.  Specifically,  if  one  aims  to  find  x  with  the  least  number  of 
nonzero  entries  that  solves  the  linear  system,  the  problem  is  known  as  f’o-minimization: 

(Po)  :  Xq  =  argmin  ||a;||o  s.t.  Ax  =  b.  (1) 

jeEK71 


The  problem  (Po)  is  intended  to  seek  entry-wise  sparsity  in  x  and  is  known  to  be  NP-hard  in  general. 
In  Compressive  Sensing  (CS)  literature,  it  has  been  shown  that  the  solution  to  (1)  often  can  be 
obtained  by  solving  a  more  tractable  linear  program,  namely,  t\ -minimization  [4, 8]: 

(Pl)  :  x\  =  argmin  || cc|| i  s.t.  Ax  =  b.  (2) 

asEM71 


*Use  footnote  for  providing  further  information  about  author  (webpage,  alternative  address) — not  for  ac¬ 
knowledging  funding  agencies. 
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This  unconventional  equivalence  relation  between  (/ V)  and  (Pi)  and  the  more  recent  numerical 
solutions  [3, 1 6]  to  efficiently  recover  high-dimensional  sparse  signal  have  been  a  very  competitive 
research  area  in  CS.  Its  broad  applications  have  included  sparse  error  correction  [6],  compressive 
imaging  [23],  image  denoising  and  restoration  [11,17],  and  face  recognition  [13,21],  to  name  a  few. 

In  addition  to  enforcing  entry-wise  sparsity  in  a  linear  system  of  equations,  the  notion  of  group 
sparsity  has  attracted  increasing  attention  in  recent  years  [  12, 13, 18],  In  this  case,  one  assumes  that 
the  matrix  A  has  some  underlying  structure,  and  can  be  grouped  into  blocks:  A  =  \Aj  ■  ■  ■  Ak\, 
where  A £  Rmxdf =  and  Y^k=i  =  n.  Accordingly,  the  vector  x  is  split  into  several  blocks  as 
xT  =  [xj  . . .  Xk],  where  Xk  £  In  this  case,  it  is  of  interest  to  estimate  x  with  the  least 
number  of  blocks  containing  non-zero  entries.  The  group  sparsity  minimization  problem  is  posed  as 


( Po,p ) 


K 

xo,P  =  argmin^  J(||xfc| 
*  k= 1 


>  0),  s.t.  Ax  =  [Ai 


A 


k 


x± 


xK 


=  b,  (3) 


whereX(-)  £  M  is  the  indicator  function.  Since  the  expression  >  0)  can  be  written 

as  ||  [||cci||p  •  •  •  llaticllp]  || q,  it  is  also  denoted  as  £0p(x),  the  £0p-notm  of  x. 

Enforcing  group  sparsity  exploits  the  problem’s  underlying  structure  and  can  improve  the  solution’s 
interpretability.  For  example,  in  a  sparsity -based  classification  (SBC)  framework  applied  to  face 
recognition,  the  columns  of  A  are  vectorized  training  images  of  human  faces  that  can  be  naturally 
grouped  into  blocks  corresponding  to  different  subject  classes,  b  is  a  vectorized  query  image,  and 
the  entries  in  x  represent  the  coefficients  of  linear  combination  of  all  the  training  images  for  recon¬ 
structing  b.  Group  sparsity  lends  itself  naturally  to  this  problem  since  it  is  desirable  to  use  images 
of  the  smallest  number  of  subject  classes  to  reconstruct  and  subsequently  classify  a  query  image. 

Furthermore,  the  problem  of  robust  face  recognition  has  considered  an  interesting  modification 
known  as  the  cross-and-bouquet  (CAB)  model:  b  =  Ax  +  e,  where  e  £  Rm  represents  possi¬ 
ble  sparse  error  corruption  on  the  observation  b  [22],  It  can  be  argued  that  the  CAB  model  can  be 
solved  as  a  group  sparsity  problem  in  (3),  where  the  coefficients  of  e  would  be  the  ( K  +  l)th  group. 
However,  this  problem  has  a  trivial  solution  for  e  =  b  and  x  =  0,  which  would  have  the  smallest 
possible  group  sparsity.  Hence,  it  is  necessary  to  further  regularize  the  entry-wise  sparsity  in  e. 

To  this  effect,  one  considers  a  mixture  of  the  previous  two  cases,  where  one  aims  to  enforce  entry- 
wise  sparsity  as  well  as  group  sparsity  such  that  x  has  very  few  number  of  non-zero  blocks  and  the 
reconstruction  error  e  is  also  sparse.  The  mixed  sparsity  minimization  problem  can  be  posed  as 


{MP0,p)  ■  {*o,p!eo}  =  argmin4;P(x) +7||e||0,  s.t.  [Ax 

(x,e) 


=  6  +  e,  (4) 


where  7  >  0  controls  the  tradeoff  between  the  entry-wise  sparsity  and  group  sparsity. 

Due  to  the  use  of  the  counting  norm,  the  optimization  problems  in  (3)  and  (4)  are  also  NP-hard  in 
general.  Hence,  several  recent  works  have  focused  on  developing  tractable  convex  relaxations  for 
these  problems.  In  the  case  of  group  sparsity,  the  relaxation  involves  replacing  the  /'o.p-norm  with 

the  f^p-norm,  where  £i.p(x)  =  ||[||xi||p  •  •  •  ||a:x||p] || x  =  ]Cfc=i  ||*k||p-  These  relaxations  are 

also  used  for  the  mixed  sparsity  case  [13]. 

In  this  work,  we  are  interested  in  deriving  and  analyzing  convex  relaxations  for  general  sparsity  min¬ 
imization  problems.  In  the  entry-wise  case,  the  main  theoretical  understanding  of  the  link  between 
the  original  NP-hard  problem  in  (1)  and  its  convex  relaxation  has  been  given  by  the  simple  fact  that 
the  £i-norm  is  a  convex  surrogate  of  the  fo-norm.  However,  in  the  group  sparsity  case,  a  similar 
relaxation  produces  a  family  of  convex  surrogates,  i.e.,  l!\ .p{x),  whose  value  depends  on  p.  This 
raises  the  question  whether  there  is  a  preferable  value  of  p  for  the  relaxation  of  the  group  sparsity 
minimization  problem?  In  fact,  we  consider  the  following  more  important  question: 

Is  there  a  unified  framework  for  deriving  convex  relaxations  of  general  sparsity  recovery  problems? 
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1.1  Paper  contributions 

We  present  a  new  optimization-theoretic  framework  based  on  Lagrangian  duality  for  deriving  convex 
relaxations  of  sparsity  minimization  problems.  Specifically,  we  introduce  a  new  class  of  equivalent 
optimization  problems  for  (Po),  (Po.p)  and  ( MPotP ),  and  derive  the  Lagrangian  duals  of  the  original 
NP-hard  problems.  We  then  consider  the  Lagrangian  dual  of  the  Lagrangian  dual  to  get  a  new 
optimization  problem  that  we  term  as  the  Lagrangian  bidual  of  the  primal  problem.  We  show 
that  the  Lagrangian  biduals  are  convex  relaxations  of  the  original  sparsity  minimization  problems. 
Importantly,  we  show  that  the  Lagrangian  biduals  for  the  (Po)  and  (Po,P)  problems  correspond  to 
minimizing  the  £  \  -norm  and  the  t\  i00-norm,  respectively. 

Since  the  Lagrangian  duals  for  (Po),  (Po,P)  and  (MPo,P)  are  linear  programs,  there  is  no  duality 
gap  between  the  Lagrangian  duals  and  the  corresponding  Lagrangian  biduals.  Therefore,  the  bidual 
based  convex  relaxations  can  be  interpreted  as  maximizing  the  Lagrangian  duals  of  the  original 
sparsity  minimization  problems.  This  provides  new  interpretations  for  the  relaxations  of  sparsity 
minimization  problems.  Moreover,  since  the  Lagrangian  dual  of  a  minimization  problem  provides  a 
lower  bound  for  the  optimal  value  of  the  primal  problem,  we  show  that  the  optimal  objective  value 
of  the  convex  relaxation  provides  a  non-trivial  lower  bound  on  the  sparsity  of  the  true  solution  to  the 
primal  problem. 

2  Lagrangian  biduals  for  sparsity  minimization  problems 

In  what  follows,  we  will  derive  the  Lagrangian  bidual  for  the  mixed  sparsity  minimization  problem, 
which  generalizes  the  entry-wise  sparsity  and  group  sparsity  cases  (also  see  Section  3).  Specifically, 
we  will  derive  the  Lagrangian  bidual  for  the  following  optimization  problem: 

K 

x*  =  argmin  ^  [akl(\\xk\\p  >  0)  +  Pk\\xk\\o],  s.t.  [A±  ■■■  AK) 

x  k= i 

where  Vfc  =  1, . . . ,  K  :  atk  >  0  and  j3k  >  0.  Given  any  unique,  finite  solution  x*  to  (5),  there 
exists  a  constant  M  >  0  such  that  the  absolute  values  of  the  entries  of  x*  are  less  than  M,  namely, 
|| ai*  ||oo  <  M.  Note  that  if  (5)  does  not  have  a  unique  solution,  it  might  not  be  possible  to  choose  a 
finite-valued  M  that  upper  bounds  all  the  solutions.  In  this  case,  a  finite -valued  M  may  be  viewed 
as  a  regularization  term  for  the  desired  solution.  To  this  effect,  we  consider  the  following  modified 
version  of  (5)  where  we  introduce  the  box  constraint  that  ||£c||oo  <  M: 

K 

^primal  =  argmin  ^  [akl{\\xk\\p  >  0)  -L  /3fc||£Cfc||0] ,  s.t.  Ax  =  hand  HccHoo  <  M,  (6) 

X 

k—1 

where  M  is  chosen  as  described  above  to  ensure  that  the  optimal  values  of  (6)  and  (5)  are  the  same. 

Primal  problem.  We  will  now  frame  an  equivalent  optimization  problem  for  (6),  for  which  we 
introduce  some  new  notation.  Let  z  £  {0,  l}n  be  an  entry-based  sparsity  indicator  for  x ,  namely, 
Zi  =  0  if  Xi  =  0  and  z,  =  1  otherwise.  We  also  introduce  a  group-based  sparsity  indicator  vector 
g  £  {0,  1 } K ,  whose  kth  entry  gi-  denotes  whether  the  /;:lh  block  x k  contains  non-zero  entries  or  not, 
namely,  gk  =  0  if  Xk  =  0  and  gk  =  1  otherwise.  To  express  this  constraint,  we  introduce  a  matrix 
II  £  {0,  l}nxK,  such  that  Hij  =  1  if  the  ith  entry  of  x  belongs  to  the  jth  block  and  I =  0 
otherwise.  Finally,  we  denote  the  positive  component  and  negative  component  of  x  as  x+  >  0  and 
x _  >  0,  respectively,  such  that  x  =  x+  —  x_. 

Given  these  definitions,  we  see  that  (6)  can  be  reformulated  as 

{x*+,x*_,z*1g*}=  argmin  [aTg  +  f3Tz],  s.t.  (a)  x+  >  0,  (b)  x_  >  0,  (c)  g  £  {0, 1}K , 

{x+,x-,z:g} 

(d)  2  €  {0, 1}”  (e)  A(x+  -  x_)  =  5,  (f)  Ylg  >  ^{x+ +  ®_),  and  (g)  z  >  -^( x+ +  X-), 

(7) 

where  a  =  [ ol\  ■  ■  ■  a.k\T  £  and  f3  =  [•  ■  ■  ••  Pk  •  •  ■  ]T  £  K". 

dk  times 


Xi 


xK 


=  b,  (5) 
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Constraints  (a)-(d)  are  used  to  enforce  the  aforementioned  conditions  on  the  values  of  the  solution. 
While  constraint  (e)  enforces  the  condition  that  the  original  system  of  linear  equations  is  satisfied, 
the  constraints  (f)  and  (g)  ensure  that  the  group  sparsity  indicator  g  and  the  entry-wise  sparsity 
indicator  z  are  consistent  with  the  entries  of  x. 


Lagrangial  dual.  The  Lagrangian  function  for  (7)  is  given  as 

L(x+,x_,z,g,  Xi,  X2l  A3,  A4,  A5)  =  aT g  +  (3T  z  —  Xjx+  ~  X2x_  +  Xj (b  —  Ax+  +  Ax_) 


1 


1 


+  ^4  (^(*4-  x-)  —  +  a5  (jj(x+ 


X-)-z) 


(8) 

where  Ax  >  0,  A2  >  0,  A4  >  0,  and  A5  >  0.  In  order  to  obtain  the  Lagrangian  dual  function,  we 
need  to  minimize  L(-)  with  respect  to  x+,  x_,  g  and  z  [2],  Notice  that  if  the  coefficients  of  x+  and 
a:_,  i.e.,  jj(A4  +  A5)  —  AT  A3  —  Ai  and  jj  (A4  +  A5)  +  AT  A3  —  A2  are  non-zero,  the  minimization 
of  L(-)  with  respect  to  x+  and  x_  is  unbounded  below.  To  this  effect,  the  constraints  that  these 
coefficients  are  equal  to  0  form  constraints  on  the  dual  variables.  Next,  consider  the  minimization 
of  /,(■)  with  respect  to  g.  Since  each  entry  gp-  only  takes  values  0  or  1,  its  optimal  value  gp:  that 
minimizes  L(-)  is  given  as 

r0  if  ctfc  —  (IITA4)fe  >  0,  and 
1  otherwise. 


9k  = 


(9) 


A  similar  expression  can  be  computed  for  the  minimization  with  respect  to  z.  As  a  consequence, 
the  Lagrangian  dual  problem  can  be  derived  as 

{A*}®=1  =  argmax[Ajb  +  1  min{0,  o;  —  IITA4}  +  1T  min{0,  (3  —  A5 }] ,  s.t. 


{A.}? 

(a)  V*  =  1, 2, 4, 5  :  A*  >  0,  (b)  —(A 4  +  A5)  —  AT  A3  —  A4  =  0 
and  (c)  —  (A4  +  A5)  +  AtA3  —  A2  =  0. 


(10) 


This  can  be  further  simplified  by  rewriting  it  as  the  following  linear  program: 

{A*}J=3  =  argiriax [A3r  b  +  1T A6  +  1T A7] ,  s.t.  (a)  A4  >  0,  (b)  A5  >  0,  (c)  A6  <  0,  (d)  X7  <  0, 

(11) 


(MT= 


(e)  A 6  <  a  —  nTA4,  (f)  A7  <  /3  —  A5  and  (g)  —  -^(A4  +  A5)  <  ATA3  <  —  (A4  +  A5). 

Notice  that  we  have  made  two  changes  in  going  from  (10)  to  (11).  First,  we  have  replaced  constraints 
(b)  and  (c)  in  (10)  with  the  constraint  (g)  in  (11)  and  eliminated  Ai  and  A2  from  (11).  Second,  we 
have  introduced  variables  Xe  and  A7  to  encode  the  “min”  operator  in  the  objective  function  of  (10). 

Lagrangian  bidual.  We  will  now  consider  the  Lagrangian  dual  of  (1 1),  which  will  be  referred  to  as 
the  Lagrangian  bidual  of  (7).  It  can  be  verified  that  the  Lagrangian  dual  of  (1 1)  is  given  as 


K 


{x*+,x*_,z* ,g*}=  argmin  aT g  +  (3  z  s.t.  (a)  x+  >  0,  (b)  x_  >  0,  (c)  g  £  [0, 1] 

{x+,x_  ,z,g} 

(d)  z  <E  [0,  l]n  (e)  A(x+  -  X- )  =  b,  (f)  II g  >  jj{x+  +  x_)  and  (g)  z  >  jj{x+  +  *-)■ 


(12) 


Notice  that  in  going  from  (7)  to  (12),  the  discrete  valued  variables  z  and  g  have  been  relaxed  to  take 
real  values  between  0  and  1.  Given  thatz  <  1  and  noting  that  a;  can  be  represented  as  a:  =  x+  —  x_, 
we  can  conclude  from  constraint  (g)  in  (12)  that  the  solution  x*  satisfies  \\x“  <  M.  Moreover, 

given  that  g  and  z  are  relaxed  to  take  real  values,  we  see  that  the  optimal  values  for  and  z*  are 
mIN/sIIoo  an£l  Jl\xi  l»  respectively.  Hence,  we  can  eliminate  constraints  (f)  and  (g)  by  replacing  z 
and  g  by  these  optimal  values.  It  can  then  be  verified  that  solving  (12)  is  equivalent  to  solving  the 
problem: 

1  K 

^bidimi  =  argmin—  V  [afc Halloo  +  /3* || £cfc|| 4]  s.t.  (a)  Ax  =  b  and  (b)  HtcHoo  <  M.  (13) 

*  M  k= 1 


This  is  the  Lagrangian  bidual  for  (7). 
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3  Theoretical  results  from  the  biduality  framework 

In  this  section,  we  first  describe  some  properties  of  the  biduality  framework  in  general.  We  will  then 
focus  on  some  important  results  for  the  special  cases  of  entry-wise  sparsity  and  group  sparsity. 

Theorem  1.  The  optimal  value  of  the  Lagrangian  bidual  in  (13)  is  a  lower  bound  on  the  optimal 
value  of  the  NP-hard  primal  problem  in  (7). 

Proof  Since  there  is  no  duality  gap  between  a  linear  program  and  its  Lagrangian  dual  [2],  the  opti¬ 
mal  values  of  the  Lagrangian  dual  in  (1 1)  and  the  Lagrangian  bidual  in  (13)  are  the  same.  Moreover, 
we  know  that  the  optimal  value  of  a  primal  minimization  problem  is  always  bounded  below  by  the 
optimal  value  of  its  Lagrangian  dual  [  2],  We  hence  have  the  required  result.  □ 

Remark  1.  Since  the  original  primal  problem  in  (7)  is  NP-hard,  we  note  that  the  duality  gap  be¬ 
tween  the  primal  and  its  dual  in  (11)  rW  non-zero  in  general.  Moreover,  we  notice  that  as  we  increase 
M  ( i.e.,  a  more  conservative  estimate),  the  optimal  value  of  the  primal  is  unchanged,  but  the  optimal 
value  of  the  bidual  in  (13)  decreases.  Plence,  the  duality  gap  increases  as  M  increases. 

M  in  (6)  should  preferably  be  equal  to  ||x*rimal||00,  which  may  not  be  possible  to  estimate  accurately 
in  practice.  Therefore,  it  is  of  interest  to  analyze  the  effect  of  taking  a  very  conservative  estimate  of 
M,  i.e.,  choosing  a  large  value  for  M.  In  what  follows,  we  show  that  taking  a  conservative  estimate 
of  M  is  equivalent  to  dropping  the  box  constraint  in  the  bidual. 

For  this  purpose,  consider  the  following  modification  of  the  bidual: 

K 

^bidual-conservative  =  argmin  ^  [o-fe  1 1 03*  1 1  oo  +  0k||*fc[|l]  S.t.  Ax  =  6,  (14) 

X  k=l 

where  we  have  essentially  dropped  the  box  constraint  (b)  in  (13).  It  is  easy  to  verify  that  VM  > 
max{||tEprimal||00, 1 1  ^bidiiai-conservativs:  1 1  oo  }  >  we  have  that  <dual  =  ■  Therefore,  we  see 

that  taking  a  conservative  value  of  M  is  equivalent  to  solving  the  modified  bidual  in  (14). 

3.1  Results  for  entry-wise  sparsity  minimization 

Notice  that  by  substituting  a±  =  •  •  •  =  =  0  and  /?!  =  •••=  (3 k  =  1,  the  optimization  problem 

in  (5)  reduces  to  the  entry-wise  sparsity  minimization  problem  in  (1).  Hence,  the  Lagrangian  bidual 
to  the  M-regularized  entry-wise  sparsity  problem  ( If )  is: 

^entry-wise-biduai  =  argmin ~ \ \ x \ | i  s.t.  (a)  Ax  =  b  and  (b)  Halloo  <  M.  (15) 

More  importantly,  we  can  also  conclude  from  (14)  that  solving  the  Lagrangian  bidual  to  the  entry- 
wise  sparsity  problem  with  a  conservative  estimate  of  M  is  equivalent  to  solving  the  problem: 

^entry-wise-bidual-conservative  =  argmill 1 1 03 1 1  x  S.t.  Ax  =  b,  (16) 

X 

which  is  precisely  the  well-known  1^-norm  relaxation  for  (P0).  Our  framework  therefore  provides 
a  new  interpretation  for  this  relaxation: 

Remark  2.  The  £-\  -norm  minimization  problem  in  (16)  is  the  Lagrangian  bidual  of  the  i^-norm 
minimization  problem  in  (1),  and  solving  (16)  is  equivalent  to  maximizing  the  dual  of  (1). 

We  further  note  that  we  can  now  use  the  solution  of  (15)  to  derive  a  non-trivial  lower  bound  for  the 
primal  objective  function  which  is  precisely  the  sparsity  of  the  desired  solution.  More  specifically, 
we  can  use  Theorem  1  to  conclude  the  following  result: 

Corollary  1.  Let  x$  be  the  solution  to  (1).  We  have  that  VM  >  || II oo.  the  sparsity  of  x q,  i.e., 
11*5  llo  is  bounded  below  by  jy\\x*ntry.wise.bidual\\i. 

Due  to  the  non-zero  duality  gap  in  the  primal  entry-wise  sparsity  minimization  problem,  the  above 
lower  bound  provided  by  Corollary  1  is  not  tight  in  general. 
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3.2  Results  for  group  sparsity  minimization 


Notice  that  by  substituting  ai  =  ■  ■  ■  =  ax  =  1  and  3\  =  ■  ■  ■  =  fjK  =  0,  the  optimization  problem 
in  (5)  reduces  to  the  group  sparsity  minimization  problem  in  (3).  Hence,  the  Lagrangian  bidual  of 
the  group  sparsity  problem  is: 


(a)  Ax  =  b  and  (b)  ||®||oo  <  M.  (17) 


As  in  the  case  of  entry-wise  sparsity  above,  solving  the  bidual  to  the  group  sparsity  problem  with  a 
conservative  estimate  of  M  is  equivalent  to  solving: 


K 

argmin^  IjtCfelloo  s.t.  Ax  =  b, 


(18) 


group-bidual-conservative 


which  is  the  convex  fii00-norm  relaxation  of  the  /:o,?,-min  problem  (3).  In  other  words,  the  biduality 
framework  selects  the  t\ j00-norm  out  of  the  entire  family  of  £l  p -norms  as  the  convex  surrogate  of 
the  £oiP-norm. 

Finally,  we  use  Theorem  1  to  show  that  the  solution  obtained  by  minimizing  the  t-\  lOC-norm  provides 
a  lower  bound  for  the  group  sparsity. 

Corollary  2.  Let  Xq  be  the  solution  to  (3).  For  any  M  >  ||®op||oo>  the  group  sparsity  of  Xq  p,  i.e., 
4,P(*o,p)»  is  bounded  below  by  jjh,oo(x*mup_biduaI). 

The  £ij00-norm  seems  to  be  an  interesting  choice  for  computing  the  lower  bound  of  the  group  spar¬ 
sity,  as  compared  to  other  l\  7, -norms  for  finite  p  <  oo.  For  example,  consider  the  case  when 
p  =  1,  where  the  f  |.p-norm  is  equivalent  to  the  t-\  -norm.  Assume  that  A  consists  of  a  single  block 
with  several  columns  so  that  the  maximum  number  of  non-zero  blocks  is  1.  Denote  the  solution  to 
the  i\ -minimization  problem  as  x\.  It  is  possible  to  construct  examples  (also  see  Figure  1)  where 

1.  Hence,  it  is  unclear  in  general  if  the  solutions  obtained  by  minimizing 
fi  p-norms  for  finite-valued  p  <  oo  can  help  provide  lower  bounds  for  the  group  sparsity. 

4  Experiments 

We  now  present  experiments  to  evaluate  the  bidual  framework  for  minimizing  entry-wise  sparsity 
and  mixed  sparsity.  We  present  experiments  on  synthetic  data  to  show  that  our  framework  can  be 
used  to  compute  non-trivial  lower  bounds  for  the  entry-wise  sparsity  minimization  problem.  We 
then  consider  the  face  recognition  problem  where  we  compare  the  performance  of  the  bidual-based 
^i,oo -norm  relaxation  with  that  of  the  t  \  2-norm  relaxation  for  mixed  sparsity  minimization. 

We  use  boxplots  to  provide  a  concise  representation  of  our  results’  statistics.  The  top  and  bottom 
edge  of  a  boxplot  for  a  set  of  values  indicates  the  maximum  and  minimum  of  the  values.  The  bottom 
and  top  extents  of  the  box  indicate  the  25  and  75  percentile  mark.  The  red  mark  in  the  box  indicates 
the  median  and  the  red  crosses  outside  the  boxes  indicate  potential  outliers. 

Entry-wise  sparsity.  We  now  explore  the  practical  implications  of  Corollary  1  through  synthetic 
experiments.  We  randomly  generate  entries  of  A  €  R128x256  and  x$  £  R256  from  a  Gaussian 
distribution  with  unit  variance.  The  sparsity  of  x0  is  varied  from  1  to  64  in  steps  of  3.  We  solve  (15) 
with  b  =  Ax o  using  M  =  Mq ,  2Mq  and  5Mq.  where  Mq  =  ||ato||oo-  We  use  Corollary  1  to  compute 
lower  bounds  on  the  true  sparsity,  i.e.,  ||  II  o-  We  repeat  this  experiment  1000  times  for  each  sparsity 
level  and  Figure  1  shows  the  boxplots  for  the  bounds  computed  from  these  experiments. 

We  first  analyze  the  lower  bounds  computed  when  M  =  Mq,  in  Figure  1(a).  As  explained  in  Section 
3.1,  the  bounds  are  not  expected  to  be  tight  due  to  the  duality  gap.  Notice  that  for  extremely  sparse 
solutions,  the  maximum  of  the  computed  bounds  is  close  to  the  true  sparsity  but  this  diverges  as  the 
sparsity  of  xt]  reduces.  The  median  value  of  the  bounds  is  much  looser  and  we  see  that  the  median 
also  diverges  as  the  sparsity  of  xo  reduces.  Furthermore,  the  computed  lower  bounds  seem  to  grow 
linearly  as  a  function  of  the  true  sparsity.  Similar  trends  are  observed  for  M  =  2Mq  and  5Mo 
in  Figures  1(b)  and  1(c),  respectively.  As  expected  from  the  discussion  in  Section  3.1,  the  bounds 
become  very  loose  as  M  increases. 
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Figure  1:  Results  for  computing  the  lower  bounds  on  the  true  (black  lines)  entry-wise  sparsity 
|| £c0 1| o  obtained  over  1000  trials.  The  bounds  are  computed  by  solving  (15)  and  using  Corollary  1 
with  M  =  Mo,  2Mo  and  5Mo,  where  Mo  =  || ato II oo -  Notice  that  as  expected  from  the  discussion  in 
Section  3.1,  the  bounds  are  not  tight  due  to  the  duality  gap  and  become  looser  as  M  increases. 


In  theory,  we  would  like  to  have  per-instance  certificates-of-optimality  of  the  computed  solution, 
where  the  lower  bound  is  equal  to  the  true  sparsity  ||cco||o-  Nonetheless,  we  note  that  this  ability 
to  compute  a  per-instance  non-trivial  lower  bound  on  the  sparsity  of  the  desired  solution  is  an  im¬ 
portant  step  forward  with  respect  to  the  previous  approaches  that  require  pre-computing  optimality 
conditions  for  equivalence  of  solutions  to  the  /'(j-iiorm  and  £\  -norm  minimization  problems. 

We  have  performed  a  similar  experiment  for  the  group  sparsity  case,  and  observed  that  the  bidual 
framework  is  able  to  provide  non-trivial  lower  bounds  for  the  group  sparsity  also. 

Mixed  sparsity.  We  now  evaluate  the  results  of  mixed  sparsity  minimization  for  the  sparsity-based 
face  recognition  problem,  where  the  columns  of  A  represent  training  images  from  the  K  face  classes: 
Ai,  ■  ■  ■  ,  Ax  and  b  £  RT"  represents  a  query  image.  We  assume  that  a  subset  of  pixel  values  in  the 
query  image  may  be  corrupted  or  disguised.  Hence,  the  error  in  the  image  space  is  modeled  by  a 
sparse  error  term  e:  b  =  bo  +  e,  where  bo  is  the  uncorrupted  image.  A  linear  representation  of  the 
query  image  forms  the  following  linear  system  of  equations: 

b  =  Ax  +  e  =  [A±  ■■■  Ak  I]\xJ  ■■■  x \  eT]T,  (19) 

where  I  is  the  to  x  to  identity  matrix.  The  goal  of  sparsity-based  classification  (SBC)  is  to  minimize 
the  group  sparsity  in  x  and  the  sparsity  of  e  such  that  the  dominant  non-zero  coefficients  in  x  reveal 
the  membership  of  the  ground-truth  observation  bo  =  6  —  e  [13, 21].  In  our  experiments,  we  solve 
for  x  and  e  by  solving  the  following  optimization  problem: 

K 

{x*hv,el}  =  argminy^  ||a;fc||p  +  7||e||i  s.t.  Ax  +  e  =  b.  (20) 

f^}  k=  l 

Notice  that  for  p  =  oo,  this  reduces  to  solving  a  special  case  of  the  problem  in  (14),  i.e.,  the  bidual 
relaxation  of  the  mixed  sparsity  problem  with  a  conservative  estimate  of  M.  In  our  experiments,  we 
set  7  =  0.01  and  compare  the  solutions  to  (20)  obtained  using  p  =  2  and  p  =  oo. 

We  evaluate  the  algorithms  on  a  subset  of  the  AR  dataset  [1]  which  has  manually  aligned  frontal 
face  images  of  size  83  x  60  for  50  male  and  50  female  subjects,  i.e.,  K  =  100  and  to  =  4980. 
Each  individual  contributes  7  un-occluded  training  images,  7  un-occluded  testing  images  and  12 
occluded  testing  images.  Hence,  we  have  700  training  images  and  1900  testing  images.  To  compute 
the  number  of  non-zero  blocks  in  the  coefficient  x  estimated  for  a  testing  image,  we  find  the  number 
of  blocks  whose  energy  £2 {xr)  is  greater  than  a  specified  threshold. 

The  results  of  our  experiments  are  presented  in  Figure  2.  The  solution  obtained  with  p  =  2  gives 
better  group  sparsity  of  x.  However,  a  sparser  error  e  is  estimated  with  p  =  00.  The  number  of 
non-zero  entities  in  a  solution  to  (20),  i.e.,  the  number  of  non-zero  blocks  plus  the  number  of  non¬ 
zero  error  entries,  is  lower  for  the  solution  obtained  using  p  =  00  rather  than  that  obtained  using 
p  =  2.  However,  the  primal  mixed-sparsity  objective  value  £otP(x)  +  7||e||o  (see  (4))  is  lower  for 
the  solution  obtained  using  p  =  2. 
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(a)  Group  sparsity  for  p  =  2  (b)  Group  sparsity  for  p  =  oo  (c)  Difference  in  group  sparsities 


(d)  Entry-wise  sparsity  for  p  =  2  (e)  Entry-wise  sparsity  for  p  =  oo  (f)  Diff.  in  entry-wise  sparsities 


Figure  2:  Comparison  of  mixed  sparsity  of  the  solutions  to  (20)  for  p  =  2  and  p  =  oo.  We  present 
boxplots  for  group  sparsity  of  x  and  entry-wise  sparsity  of  e.  The  differences  are  calculated  as  (#= 
non-zero  blocks/elements  for  p  =  2)  -  ($=  non-zero  blocks/elements  for  p  =  oo).  We  see  that  for 
p  =  2  we  get  better  group  sparsity  of  x,  but  we  get  a  more  sparse  error  e  when  we  use  p  =  oo. 


p  = 

=  2 

p  = 

OO 

^(correct  results) 

%(correct  results) 

^(correct  results) 

%(correct  results) 

un-occluded 

655 

93.57% 

663 

94.71% 

occluded 

643 

53.58% 

691 

57.58% 

total 

1298 

68.32% 

1324 

69.68% 

Table  1:  Classification  results  on  the  AR  dataset  using  the  solutions  obtained  by  minimizing  mixed 
sparsity.  The  test  set  consists  of  700  un-occluded  images  and  1200  occluded  images. 


We  now  compare  the  classification  results  obtained  with  the  solutions  x  computed  in  our  experi¬ 
ments.  For  classification,  we  consider  the  non-zero  blocks  in  x  and  then  assign  the  query  image  to 
the  block,  i.e.,  subject  class,  for  which  it  gives  the  least  t 2  residual  ||  b  —  Ak || 2-  The  results  are 
presented  in  Table  1 .  Notice  that  the  classification  results  obtained  with  p  =  00  (the  bidual  relax¬ 
ation)  are  better  than  those  obtained  using  p  =  2.  Since  the  classification  of  un-occluded  images  is 
already  very  good  using  p  =  2,  classification  with  p  =  00  gives  only  a  minor  improvement  in  this 
case.  However,  a  more  tangible  improvement  is  noticed  in  the  classification  of  the  occluded  images. 
Therefore  the  classification  with  p  =  00  is  in  general  better  than  that  obtained  with  p  =  2,  which  is 
considered  the  state-of-the-art  for  sparsity-based  classification  [13], 


5  Discussion 

We  have  presented  a  novel  analysis  of  several  sparsity  minimization  problems  which  allows  us  to 
interpret  several  convex  relaxations  of  the  original  NP-hard  primal  problems  as  being  equivalent  to 
maximizing  their  Lagrangian  duals.  The  pivotal  point  of  this  analysis  is  the  formulation  of  mixed- 
integer  programs  which  are  equivalent  to  the  original  primal  problems.  While  we  have  derived  the 
biduals  for  only  a  few  sparsity  minimization  problems,  the  same  techniques  can  also  be  used  to 
easily  derive  convex  relaxations  for  other  sparsity  minimization  problems  [7], 

An  interesting  result  of  our  biduality  framework  is  the  ability  to  compute  a  per-instance  certificate  of 
optimality  by  providing  a  lower  bound  for  the  primal  objective  function.  This  is  in  contrast  to  most 
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previous  research  which  aims  to  characterize  either  the  subset  of  solutions  or  the  set  of  conditions 
for  perfect  sparsity  recovery  using  the  convex  relaxations  [5, 6, 8-10, 14, 15, 20],  In  most  cases,  the 
conditions  are  either  weak  or  hard  to  verify.  More  importantly,  these  conditions  needed  to  be  pre¬ 
computed  as  opposed  to  verifying  the  correctness  of  a  solution  at  run-time.  In  lieu  of  this,  we  hope 
that  our  proposed  framework  will  prove  an  important  step  towards  per-instance  verification  of  the 
solutions.  Specifically,  it  is  of  interest  in  the  future  to  explore  tighter  relaxations  for  the  verification 
of  the  solutions. 

Acknowledgments 

This  research  was  supported  in  part  by  ARO  MURI  W911NF-06-1-0076,  ARL  MAST-CTA 
W911NF-08-2-0004,  NSF  CNS-0931805,  NSF  CNS-0941463  and  NSF  grant  0834470.  The  views 
and  conclusions  contained  in  this  document  are  those  of  the  authors  and  should  not  be  interpreted  as 
representing  the  official  policies,  either  expressed  or  implied,  of  the  Army  Research  Laboratory  or 
the  U.S.  Government.  The  U.S.  Government  is  authorized  to  reproduce  and  distribute  for  Govern¬ 
ment  purposes  notwithstanding  any  copyright  notation  herein. 


References 

[1J  A.  M.  Martinez,  and,  R.  Benavente,  “The  AR  face  database.”  CVC  Technical  Report  #24,  1998. 

[2]  S.  Boyd  and  L.  Vandenberghe,  Convex  Optimization.  Cambridge  University  Press,  2004. 

[3]  A.  Bruckstein,  D.  Donoho,  and  M.  Elad.  From  sparse  solutions  of  systems  of  equations  to  sparse  modeling 
of  signals  and  images.  SIAM  Review,  51(1):34— 81,  2009. 

[4]  E.  Candes.  Compressive  sampling.  In  Proceedings  of  International  Congress  of  Mathematics,  2006. 

[5]  E.  Candes.  The  restricted  isometry  property  and  its  implications  for  compressed  sensing.  C.  R.  Acad.  Sci., 
Paris,  Series  I,  346:  589-592,  2008. 

f61  E.  Candes  and  T.  Tao.  Decoding  by  linear  programming.  IEEE  Transactions  on  Information  Theory, 
5 1(12):4203 — 4215,  2005. 

[7]  V.  Cevher,  M.  F.  Duarte,  C.  Hegde,  R.  G..  Baraniuk.  Sparse  signal  recovery  using  Markov  random  fields. 
In  NIPS  257-264,  2008. 

[8]  D.  Donoho  and  M.  Elad.  Optimally  sparse  representation  in  general  (nonorthogonal)  dictionaries  via  l1 
minimization.  PNAS,  100(5):2197-2202,  2003. 

[9]  D.  Donoho.  For  most  large  underdetermined  systems  of  linear  equations,  the  minimal  G-norm  near¬ 
solution  approximates  the  sparsest  near-solution.  Communications  on  Pure  and  Applied  Mathematics. 
2006. 

[10]  D.  Donoho,  and  M.  Elad,  and  V.  N.  Temlyakov.  Stable  recovery  of  sparse  overcomplete  representations 
in  the  presence  of  noise.  IEEE  Trans,  on  Information  Theory,  52(1):  6-18,  2006. 

[11]  M.  Protter,  M.  Elad.  Image  sequence  denoising  via  sparse  and  redundant  representations.  IEEE  Transac¬ 
tions  on  Image  Processing  18(1):  27-35,  2009. 

[12]  Y.  Eldar  and  M.  Mishali.  Robust  recovery  of  signals  from  a  structured  union  of  subspaces.  IEEE  Trans¬ 
actions  on  Information  Theory,  55(1 1):5302 — 53 16,  2009. 

[13]  E.  Elhamifar  and  R.  Vidal.  Robust  classification  using  structured  sparse  representation.  In  IEEE  Confer¬ 
ence  on  Computer  Vision  and  Pattern  Recognition,  2011. 

[14]  A.  Fletcher,  S.  Rangan,  and  V.  Goyal.  Necessary  and  sufficient  conditions  on  sparsity  pattern  recovery. 
IEEE  Transactions  on  Information  Theory,  55(12):  5758-5772  (2009) 

[15]  A.  Iouditski,  F.  K.  Karzan,  and  A.  Nemirovski.  Verifiable  conditions  of  G  -recovery  of  sparse  signals  with 
sign  restrictions.  ArXiv  e-prints,  2009. 

[16]  I.  Loris.  On  the  performance  of  algorithms  for  the  minimization  of  G-penalized  functionals.  Inverse 
Problems,  25:1-16,  2009. 

[17]  J.  Mairal,  M.  Elad  and  G.  Sapiro.  Sparse  representation  for  color  image  restoration.  IEEE  Transactions 
on  Image  Processing  17(1):  53-69,  2008. 

[18]  M.  Stojnic,  F.  Parvaresh,  B.  Hassibi.  On  the  reconstruction  of  block-sparse  signals  with  an  optimal  number 
of  measurements.  IEEE  Transactions  on  Signal  Processing  57(8):  3075-3085,  2009. 

[19]  J.  Tropp.  Greed  is  Good:  Algorithmic  results  for  sparse  approximation.  IEEE  Transactions  on  Information 
Theory  50(10):  2231-2242,  2004. 


9 


[20]  G.  Reeves  and  M.  Gastpar.  Efficient  sparsity  pattern  recovery.  In  Proc.  30th  Symp.  on  Information  Theory 
Benelux,  2009. 

[21]  J.  Wright,  A.  Yang,  A.  Ganesh,  S.  Sastry,  and  Y.  Ma.  Robust  face  recognition  via  sparse  representation. 
IEEE  Transactions  on  Pattern  Analysis  and  Machine  Intelligence,  3 1(2):210 — 227,  2009. 

[22]  J.  Wright  and  Y.  Ma.  Dense  Error  Correction  via  t\ -Minimization.  IEEE  Transactions  on  Information 
Theory,  56(7),  2010. 

[23]  M.  Wakin,  J.  Laska,  M.  Duarte,  D.  Baron,  S.  Sarvotham,  D.  Takhar,  K.  Kelly  and  R.  Baraniuk.  Compres¬ 
sive  imaging  for  video  representation  and  coding.  In  Picture  Coding  Symposium,  2006. 


10 


